DOCUMENT RESUME 



ED 335 366 



TM 016 960 



AUTHOR 
TITLE 

INSTITUTION 
SPONS AGENCY 

PUB DATE 
NOTE 
PUB TYPE 



Shealy, Robin; Stout, William 

A Procedure To Detect Test Bias present 

Simultaneously in Several Items. 

Illinois Univ., Urbana. Dept. of Statistics. 

Office of Naval Research, Arlington, VA. Cognitive 

and Neural Sciences Div. 

25 Apr 91 

42p. 

Reports - Evaluative/Feasibility (142) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MF01/PC02 Plus Postage. 
•Ability; Computer Simulation; "Equations 
(Mathematics); • Item Bias; Item Response Theory; 
•Mathematical Models; "Testing Problems 
Ability Estimates; Mantel Haenszel Procedure; 
•Simultaneous Item Bias Procedure 



ABSTRACT 

A statistical procedure is presented that is designed 
to test for unidirectional test bias existing simultaneously in 
several items of an ability test, based on the assumption that test 
bias is incipient within the two groups' ability differences. The 
proposed procedure — Simultaneous Item Bias (SIB) — is based on a 
multidimensional item response theory (IRT) approach. SIB 
statistically tests for bias in one or more items at a time, and is 
corrected for the inflation (or deflation) of the test statistic due 
to target ability difference, a valid group difference that is 
conceptually independent of psychological test bias. The correction 
plays the same role as does the practice of including the single 
studied item in the matching criterion score in the Mantel-Haenszel 
(MH) procedure that is adapted for test responses. It is shown 
through the initial portion of an extensive simulation study in 
progress with 84 cases that, with the correction in place, the 
procedure performs as well as does the MH procedure in many cases 
when there is a single biased item, and it performs well in the case 
of multiple item test bias. Twelve tables present data from the 
simulation, and four graphs illustrate study findings. A 20-item list 
of references is included. (Author/SLD) 



********************************************************************** 

* Reproductions supplied by EDRS are the best that can be made 

• from the original document. 




ERIC 





BEST COPY AVAILABLE 



SgugjTg ClaMificaW 6* YHiTTCgT 



Form Approved 

REPORT DOCUMENTATION PAGE ombno.0704.oi88 


1r REPORT SECURITY CLASSJFICATlON 

Unclassified 


1b. RESTRICTIVE MARKINGS 


2t. SECURITY CLASSIFICATION AUTHORITY 


3. DISTRIBUTION /AVAILABILITY OF REPORT 

Approved for public release; 
aisiriuuiion unxxriLLceu 


2b. DECLASSIFICATION /DOWNGRADING SCHEDULE 


4. PERFORMING ORGANIZATION REPORT NUMBER(S) 

1991 - #3 


5. MONITORING ORGANIZATION REPORT NUMBER(S) 


6a. NAME OF PERFORMING ORGANIZATION 

University of Illinois 
Denartment of Statistics^ 


6b. OFFICE SYMBOL 
(if spplicsble) 


7*. NAME Or MONITORING ORGANIZATION 

Cognitive Science Program 

Off in* of Naval Rpeeanrh ffnrte 1145 CS) 


6c ADDRESS (C/fy, State, snd ZIP Code) 

101 Illini Hall 
725 S. Wright St. 
Chanpaign, IL 61820 


7b. ADDRESS (C/fy, State, *nd ZIP Code) 

ouu yumcy 
Arlington, VA 22217-5000 


8a. NAME OF FUNDING /SPONSORING 
ORGANIZATION 


3b. OFFICE SYMBOL 
(If spplicsble) 


9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 

N00014-90-J-1940 


Be ADDRESS (City, Stste.snd ZIP Code) 

101 Illini Hall 
725 S. Wright St. 
Chamoaicm. IL 61820 


10. SOURCE OF FUNDING NUMBERS 


PROGRAM 
ELEMENT NO. 

61153N 


PROJECT 
NO. 

RR04204 


TASK 
NO 

RR04204-01 


WORK UNIT 
ACCESSION NO. 

4421-548 



11. TITLE (Include Security dissipation) 

A Procedure to Detect Item Bias Present Simultaneously in Several Items 



12. PERSONAL AUTHOR(S) 

Robin Shealy and William Stout 



IS.*. TYPE OF REPORT 


11 3b- TIME COVERED 


14. DATE OF 


REPORT (Yejr, Month, D*y) 


115 PAGE COUNT 


technical 


| FROM 1QPR TO 1QQ1 


Aoril 


25. 1991 


1 35 



16. SUPPLEMENTARY NOTATION 

Software to carry out the procedure is available from the authors 



17. 


COSATI CODES 


18. SUBJECT TERMS (Continue on reverse if necesssry snd identify by block number) 


FIELD 


GROUP 


SUB-GROUP 


See reverse 

















"9 ABSTRACT (Continue on reverse if necesssry snd identify by block number) 



See reverse 



» 



20. DISTRIBUTION /AVAILABILITY OF ABSTRACT 

Q UNCLASSIFIED/UNLIMITED □ SAME AS RPT. □ DTiC USERS 


21. ABSTRACT SECURITY CLASSIFlCAl 


'ION 


22a. NAME OF RESPONSIBLE INDIVIDUAL 

Dr, Charles K.- Dads 


22b. TELEPHONE (Include Ares Code) 


22c OFFICE SYMBOL 



DD Form 1473, JUN 86 Previous editions ere obsolete, security classification of this page 

S/N 0102-LF-0U-6603 



ERIC 



ABSTRACT 



This paper presents a statistical procedure (denoted by SIB) designed to test for uni- 
directional test bias existing simultaneously in several items of an ability test. It was 
argued in Shealy and Stout (1991) that in order to model such bias with an IRT model, a 
multidimensional model is necessary. The proposed procedure, based on this multidimen- 
sional IRT modeling approach, statistically tests for bias in one or more items at a time 
and is corrected for the inflation (or deflation) of the test statistic due to target ability 
difference, a valid group difference that is conceptually independent of psychological test 
bias. The correction plays the same role as the practice of including the single studied 
item in the "matching criterion" score in the Mantel-Haenszel (MH) procedure adapted 
for test responses by Holland and Thayer (1988). It is shown through the initial portion of 
an extensive simulation study underway (Shealy (1991)) that, with the correction in place, 
the procedure performs as well as the MH procedure in many cases when there is a single 
biased item, and performs well in the case of multiple item test bias. 



Key Words: item bias, test bias, DIF, latent trait theory, item response theory, target abil- 
ity, valid subtest, nuisance determinants, potential for bias, expressed bias, unidirectional 
test bias, bidirectional test bias, SIB, Mantel-Haenszel. 



INTRODUCTION 



The purpose of this paper is to present a statistical procedure (denoted by SIB for 
simultaneous item bias) for detecting bias present in one or more test items of a standard- 
ized ability test. The procedure is based on the multidimensional item response theory 
(IRT) model of test bias presented in Shealy and Stout (1991). By "test bias" we mean 
a formalization of the intuitive idea that a test is less valid for one group of examinees 
than for another group in its attempt to assess examinee differences in a prescribed la- 
tent trait, such as mathematics ability. Test bias is conceptualized herein as the result of 
individually-biased items acting in concert through a test scoring method, such as number 
correct, to produce a biased test. 

Two distinct features of this conceptualization of bias are as follows. First, it provides 
a mechanism for explaining how several individually-biased items can combine through a 
test score to exhibit a coherent and major biasing influence at the test level. In partic- 
ular, this can be true even if each individual item displays only a minor amount of item 
bias. For example, word problems on a mathematics test that are too dependent on so- 
phisticated written English comprehension could combine to produce pervasive test bias 
against English-as-a-second-language examinees. A second feature, possible because of our 
multidimensional modeling approach, is that the underlying psychological mechanism that 
produces bias is addressed. This mechanism lies in the distinction made between the abil- 
ity the test is intended to measure, called the target ability, and other abilities influencing 
test performance that the test does not intend to measure, called nuisance determinants. 
Test bias will be seen to occur because of the presence of nuisance determinants possessed 
in differing amounts by different examinee groups. Through the presence of these nuisance 
determinants, bias then is expressed in one or more items. 

The test bias detection procedure can simultaneously assess bias in several items, 
thus addressing the above two features. In contrast, most item bias procedures detailed 
in the literature perform tests on a single item at a time: The pseudo IRT procedure 
of Linn and Harnish (1981) estimates possibly group-dependent item response functions 
(IRFs) without the use of item parameter estimation algorithms when the sample size is 
too small for their use. Thissen, Steinberg, and Wainer (1988) employ marginal maximum 
likelihood estimation to obtain group-dependent item parameters in a 3-parameter logistic 
framework and use the likelihood ratio test to test the equality of the parameters across 
group. The Mantel-Haenszel procedure, adapted for test response data by Holland and 
Thayer (19SS), and which is in wide use, employs the practice of using the score of the 
entire test instead of the score of the non-studied items as the "matching criterion" to test 
for item bias. Etc. Conceivably these procedures could be used once for each item in a set 
of items being tested for bias, and multiple comparison procedures could be employed to 
assess the hypothesis of the entire set being biased. However, if the amount of bias is small 



in each item, a multiple comparison procedure may not pick up bias in the set of items at 
all. Moreover this approach cannot address underlying causal mechanisms of bias. 

The novelty of our approach to detecting test bias lies not so much with its recognition 
of the role of nuisance determinants in the expression of test bias, but rather in its explicit 
use of a multidimensional model to motivate the procedure to detect it. The presence of 
multidimensionality of test item responses where bias is present has long been recognized 
in test and item bias studies: Lord (1980) states "if many of the items [in a test] are found 
to be seriously biased, it appears that the items are not strictly unidimensional" (p. 220). 
Recently, Lautenschlager and Park (1988) employed a technique of generating simulated 
biased item responses using a method of Ansley and Forsyth (1985), which involves using 
multidimensional item response functions (IRFs)- and latent- ability distributions to deter- 
mine conditional probabilities of correct response. Kok (1988), taking a multidimensional 
viewpoint similar to Shealy and Stout (1991), presents a specific multidimensional IRT 
model for bias where the nuisance determinants are compensating abilities, contextual 
abilities such as language, and testwiseness. 

An important issue addressed by our procedure is that a careful distinction is made be- 
tween genuine test bias, often operationally embodied as DIF (Holland and Thayer (1988)) 
by practitioners, and non-bias differences in examinee group performance, sometimes called 
impact (see, for example, Ackerman (1991) for a careful discussion of impact as distinct 
from bias), that are caused by examinee group differences in target ability distributions. 
It is important that the latter not be mistakenly labeled as test bias. The procedure 
developed herein makes this distinction in its application. 



FORMULATION OF TEST BIAS 



Test bias in this paper is modeled using a multidimensional item response theory 
(IRT) model, which is assumed to be the model behind the observed test responses. For 
purposes of exposition, we restrict ourselves to the case where there is a single nuisance 
determinant; this two-dimensional modeling approach is often realistic in practice. Exten- 
sions to multiple nuisance determinants are straightforward. For a fuller treatment of the 
conception of test bias, including the case of multiple nuisance determinants and item bias 
cancellation, in a more general framework, see Shealy and Stout (1991) and Shealy (1989). 

We consider two biologically- or sociologically-defined groups, named "reference" and 
"focal" groups (after Holland and Thayer's (1988) naming convention). A random sample 
of examinees is drawn from each group, and a test of N items is administered to them. 
Typically it is suspected that a part of the test is biased against the focal group; this 
group is usually the object of the bias study. The responses to the test items from a 
randomly-chosen examinee are denoted JJ_ = (U 1 ,... ,?/#), where each can take on 
0 or 1, according as the response to item i is incorrect or correct, respectively. 

The IRT model in general is composed of two components that generate U\ (1) a d- 
dimensional examinee ability parameter and (2) a set of item response functions (IRFs), one 
for each item, which determine the probability of correct response for the items. Here we 
restrict the model to have d = 1 or 2, because we are considering a single nuisance determi- 
nant in addition to the target ability. The ability vector is (0, 77) for an arbitrary examinee 
from either group, where 9 denotes target ability and 77 denotes the nuisance determinant. 
A distribution of (0, 77) over the combined group of examinees is induced by choosing ex- 
aminees at random; the variable for a randomly chosen examinee is denoted (0,7?). The 
IRF for item i is denoted P,(0,t?), and it is assumed that all items depend on 9, and one 
or more may depend on 77; for those dependent only on 9, the IRF is P,(0). It is implicitly 
assumed that an IRT representation for U_ in terms of (0, 77) and {P,(0, 77) : t = 1, . . . , N] 
is possible; for a fuller treatment of this assumption, see Shealy (1989). In addition, it is 
assumed that each P,(0, 77) is increasing in (9, 77) when item i is dependent on both abilities 
and increasing in 9 when it is dependent on 9 alone; and that each P,(0) is differentiable. 
Finally, local independence of U_ given (9, 77) is assumed. 

Test bias in the above-mentioned model is formulated through three components: 

(a) The potential for bias, if it exists, resides within the target ability/nuisance determi- 
nant distributions of the two groups being studied; 

(b) potential for bias is expressed in items whose responses depend on the nuisance de- 
terminant; 1 and 

1 We remark that Kok's (19SS) formulation is also based upon (a) and (b); Kok's and 
our formulation were developed independently of one another. 



(c) the scoring method of the test, to be viewed as an estimate of target ability, transmits 
expressed item biases into test bias. 

Potential for test bias is explained prosaically in the following manner. After condi- 
tioning on a particular 0, suppose that the reference group has a higher level of nuisance 
ability on average than the focal group. Then those reference group examinees with abil- 
ity 8 would have an overall advantage over the corresponding focal group examinees when 
responding to items at least partially dependent on the nuisance determinants 77 (formally, 
because of the monotonicity of the items IRFs P t (0, J?)). Formally, we define the potential 
for test bias at 0: 

Definition 1. Potential for test bias exists against -the- focal group at target ability level $ 
with respect to rj if rj | 0 0, G = F is stochastically less than rj | 0 = 0, G = i?, where 
"G = F M denotes sampling from the focal group and "(7 = j?" sampling from reference 
group. Potential for bias exists against the reference group if the converse holds. 

Note that we are restricting consideration to conditional nuisance distributions rj\Q = 
$, G = R and rj \ 0 = 0, G — F that are stochastically ordered; that is, where the 
two distribution functions do not ; ntersect. Figure 1 displays two distributions that are 
stochastically ordered and also two distributions that are not. 



place Figure 1 about here 



In order for test bias to occur, it must be expressed in one or more items. Our definition 
of expressed bias for an item, when specialized to Kok's model, is really the same as that 



of Kok (1988, p. 269). It is defined in terms of a marginalization of the multidimensional 
IRFP.CM). 

Definition 2. Let P,(0,77) be the IRF for item t. The marginal IRF for group g (g = R 
or F) with respect to target ability 6 is defined as 

r^fl-^e.ifjie-^o-y]. (i) 

When rj \ 6 has a conditional density, f(r) \ 6) say, Definition 2 translates into 

J — 00 

Definition 3. Expressed bias for item z against the focal group occurs at target ability $ 
if T iF {8) < T iR {6)\ it occurs against the reference group if the converse holds. 

A test can consist of many items simultaneously biased by the same nuisance determi- 
nant. In this case, items can cohere and act through the prescribed test score to produce 
substantial bias against a particular group even if individual items display undetectably 
small amounts of item bias. This is the final (and novel) component of our formulation of 
test bias mentioned above. We consider the large class of test scores of the form 

m (2) 

where h(u) is real valued with domain us (tij,. .. , such that = 0 or 1 for i = 
1 , . . . , N and h(u) is coordinate wise non-decreasing in u . This class contains many of 
the standard scoring procedures for many standard models; for example, number correct, 
linear formula scoring of the form £SLi a,I7 f , with a f > 0, maximum likelihood estimation 
of ability for certain logistic models with item parameters assumed known, etc. In this 
paper we restrict attention to number correct as the test score; the results presented herein 
are easily extendable to other forms of h(u). The key point about number correct scoring 
is that each item is weighted equally. Thus, if a subset of the items is suspected of bias, 
we should give equal weight to the items in this "studied" subtest in our attempt to 
quantitatively assess the amount of test bias resulting from the simultaneous influence of 
thses items. We thus define test bias for a specified studied subtest of items as follows: 

Definition 4. Let {U {l ,17,^,... , U ib ] be any subtest of items to be studied for bias from 
the test of concern and define 

b 

fc(ID«£>i' (3) 
i=i 

Then this studied subtest of items displays test bias against the focal group at 6 if 

E[h{U) \Q = 9,G = F}< E[h(U) | 0 = 6,G = Rj. 



6 



The subtest is biased against the reference group if the converse holds. 

Finally, the components of the bias formulation can be integrated using the following 
theorem, adapted from Theoreir 4.2 in Shealy and Stout (1991): 

Theorem 1. Fix a target ability 0 and choose the subtest scoring method h(u) of the 
form (3). Assume potential for bias against the focal group at 6 holds (Definition 1). Then 
test bias exists against the focal group; i.e., 

6 b 
Yim, I © = 8>G = F) < £>[lf<, I 0 = 6,G = R). (4) 

In order to test for bias of the above form, there must be an implicit assumption that a 
portion of the test measures only the target ability; -otherwise; a conditional-on-observed 
score procedure to detect bias is not possible. This set of items will be denoted the valid 
subtest. The issue of the existence and identification of a valid subtest is extremely difficult 
to frame philosophically (it is really an issue of construct validity) and must primarily be 
an empirical decision based on expert opinion or data at least in part external to the test 
being studied; it is not dealt with here. For a fuller discussion, see Shealy and Stout (1991). 
For notational simplicity we denote the valid subtest to consist of first n < N items of 
the test, and we call the remainder of the N — n items the studied subtest We note that 
use of a valid subtest is operationally equivalent to making use of a subset of items whose 
purpose is to partition examinees into "comparable" sets as is done in the MH procedure 
described below and other DIF procedures. Hence, the proposed use of a valid subtest in 
the SIB procedure can be interpreted either in the strong sense of our test bias paradigm 
or in the weak sense of the DIF paradigm (of matching of "comparable" examinees). Thus 
use of our statistical procedure for assessing bias in no way requires acceptance of our bias 
framework as opposed to a "comparability" framework, where no claims about "bias" are 
made. 

Using the above conventions, the specification of test bias against the focal group at 
8 becomes 

T,(#) s £ T iF (S) < jr T iR (9) = T„(«) (5) 

i=n+l isn+l 

because T ig {6) — E[Ui | 0 = 0, G = g] by a simple application of a standard conditioning 
formula to Definition 2. T g ($) is called the studied subtest response function for group g. 

Unidirectional test bias 

Test bias heretofore has been considered conditional on a single target ability; we now 
turn to a global perspective. If there is test bias against the same group for all 9, then 
there is unidirectional bias against this group. Specifically, if 

BW - T R {0) - T F (0) 



is the level of bias against Group F at 9 ) then unidirectional bias holds if either B{9) > 0 
for all 9 or B{9) < 0 for all 0. A strong form of unidirectional bias, termed uniform 
bias by Mellenbergh (1982), is the type of bias that the modified Mantel-Haenszel test 
statistic devised by Holland and Thayer (1988) is designed to detect. Although the Mantel- 
Haenszel approach is not dependent on an IRT framework, it can be put in a Easch 
model IRT framework, with the single biased item having group-dependent item difficulties. 
Here, the bias is "uniform" in the sense that T F {9) is merely T R {9) shifted horizontally. 
Unidirectional bias is less restrictive in that T g {9) does not have to be a logistic IRF, and 
more importantly, T R (9) does not have to be T F {9) shifted. 

Since we are concerned with bias against the focal group, it is intuitive that a suitable 
theoretical unidirectional bias index is 

fa = Jbwmwb (6) 

where fp(9) is the probability density function of 0 for the focal group. Equivalent in- 
dices weighted by the reference target ability distribution and the combined-group target 
distribution are easily conceptualized. 

THE BASIC PROCEDURE 

The statistical procedure to be presented is based on (6); the hypothesis is 

H :/3 v = 0 vs. fa > 0, 

the alternative being one-sided to specifically test for bias against the focal group. The 
test statistic to be constructed is essentially an estimate of fiy normalized to have unit 
variance. The estimate of /3y is derived first. 

Since test bias is analyzed using number correct on the studied subtest, set 

i=n+l 

to be the studied subtest score; also set X = YH=\ U\ ^° De the valid subtest score. In 
selecting the valid subtest score to be number correct, we follow the convention set out in 
Holland and Thayer (1988), among many others. Other choices would of course be possible 
and could improve the performance of the procedure. 

The naive intuition is that examinees with the same valid subtest score are examinees 
of approximately equal target ability and thus such examinees are directly comparable in 
the assessment of bias. Thus the difference 

y R k-YFk> fc = 0,...,n, (8) 
8 



where Y gk is the average Y for all examinees in group g attaining valid subtest score X = 
should provide a measure of the bias against the focal group (resulting from the reference 
group having superior nuisance ability r\ on average). In particular, if there is no bias (H 
holds), then Y Rk — Y Fk = 0 for ail k should be observed, and if there is unidirectional 
bias against the focal group {B{9) > 0 fcr all 9) then Y Rk — Y Fk > 0 for all fc, except for 
statistical error, should be observed. 

The above assertion needs support; it will suffice to argue that 

E[Y Rk - Y Fk ) = 0 for all k if B{9) = 0 for all 0, and 

E[Y Rk - Y Fk ] > 0 for all k if B{9) > 0 for all 9. ^ 

For now we restrict the target ability distributions to be equal for the two groups; i.e., 
0 | G = R and 0 | G = F have the same distribution. It is easy to prove (following (5)) 
under the model presented herein that 

E[Y 8k ) =E[Y\X = k,G = g] = E[T g (Q) \X = k,G = g). (10) 

Now assume that the valid subtest is long enough so that the distribution of 0 | X = fc, 
G = g is tightly concentrated about its mean, and hence that T g (9) is locally flat within 
the range of 9 where the distribution of 0 | X = fc, G as g mostly resides. Then 

E[T g (e) | X = *, G m g) * T g (E[© \ X = fc, G - g}) (11) 

= T g (E[G | X = fc]), 

because the two target ability distributions are equal and expectation is a linear operator. 
Thus, denoting 6 k = E[0 \ X = fc], 

«Ru-^«]-W»). (12) 

Thus (9) follows easily; the n + 1 differences in (8) provide an estimate of B(9) at n + 1 
points in the 0-domain. It is intuitive that an estimate of fly is 

n 

where p fc is the proportion (among focal group examinees) attaining X = fc. Specifically, 
if 7 tf jj. is the number of examinees in group g attaining X as fc, then p fc = t/ Ffc / £jL 0 Jjrj.. 

In the case where the target ability distributions are the same, then, it is straightfor- 
ward that 

= £>m) Mc; (14) 



9 

12 



ERIC 



where p k = P[X m k | G » F], Thus the expected value of fry is a weighted difference 
of marginal IRFs, this weighted difference approximating which is a continuously 
weighted difference of marginal IRFs. From (14), it follows that Efly = 0 if fly = 0, and 
Eflu > 0 if > 0. This suggests the standardized test statistic 

3 - (15) 

for testing if, where the denominator is defined as 

*(&,) = (pPl ( ^ 2 (k ! fc.fl) + I ) , (16) 

where a 2 (y | k,g) is the sample variance of the studied subtest scores of those group g 
examinees with valid subtest score k. A full description of the computation of the test 
statistic, with contingencies for exclusion of certain valid subtest scores based on inadequate 
examinee counts, is presented in the Appendix. B is approximately standard normal when 
/3 V = 0 and the target ability distributions are the same, because /9y is the weighted sum 
of approximately normal random variables Y Rk —Ypki these are approximately normal (for 
suitable sample sizes) by the central limit theorem (proof of asymptotic normality of B 
omitted). 

The regression correction for target ability difference 

The presence of a difference in target ability distributions in test bias studies has been 
treated in various contexts in the literature. The issue of the linking of metrics across group 
in the estimation of IRT item parameters is one such context (see Linn, et al (1981) for an 
IRT item bias approach where linking of metrics is crucial). Holland and Thayer (1988) 
also deal with this problem by including the single studied item in the matching criterion 
score of the Mantel-Haenszel test; they prove that this method completely compensates 
for target ability difference (in their context, the distributional difference in the postulated 
unidimensional latent trait) when the underlying IRT model is a Rasch model. Millsap 
and Meredith (1989) elegantly formulate the problem in terms of a divergence of two 
hypotheses (a "conditional on observed score" hypothesis and a "latent trait" hypothesis), 
which would occur if target ability difference is present. A "conditional on observed score" 
procedure such as (15) in its present form is not adequate to address the separation of 
target ability difference from test bias; the presence of target ability difference when in 
fact there is no test bias present can statistically inflate 2?, thereby suggesting test bias 
actually is present. It is therefore necessary to formulate a correction for target ability 
difference. 

10 

13 



To motivate the proposed correction it is necessary to show that a decomposition of the 
differences Y Rk — Y Fk into "test bias only" and "target ability difference only" components 
is possible. First we note that by similar arguments to those used in deriving (10) and (11), 

E[Y gk ] = T,l$, u ) t (17) 

where 8 gk m £[0 j k,g]. The condition E[Y Rk - Y Fk ] = 0 requires 8 Rk == 8 Fk > as in (11) 
where g was removed from the conditioning; but this may not happen if the target ability 
distributions are net the same, as Figure 2 suggests. Figure 2, which displays densities 
for four distributions, assumes that the distribution of 0 | F is stochastically smaller than 
that of 0 | A 



place figure 2 about here 



Note that the (conditional) distribution of 0 | k,F is stochastically smaller than that 
of 0 | k, R for all k. The standard Bayesian calculation makes this insight rigorous. Thus, 
6 Fk < 6 Rk for all fc, and, in the absence of bias, where T R (8) = T F (6) = T(8) for all 0, 

EY Fk ±T(8 Fk )<T(8 Rk )±EY Rk 

(T(8) is assumed monotone; for mild conditions giving such monotonicity, see Shealy and 
Stout (1991)). Thus 

k=Q. 

In the case where bias is present, we can thus decompose E[fiu\. 

E0u\ - Y,Pk(T R (e Rk ) - T F (6 Rk )) + j2Pk(T F (6 Rk ) - T F (6 Fk )) 

k T n k= ° (18) 

- 52p k B(e Rk ) +Y tPk r F (0i)(0 Rk - e Fk ), 

k=Q k=Q 

where 8 k is between 8 Rk and 8 Fk . (T F (8) is assumed differentiate here and the mean 
value theorem has been applied.) The first term is due only to test bias; the second is due 
only to target ability difference. 

U 14 



This approximate decomposition argument is the motivation behind the proposed 
correction. Our strategy is to adjust Y Rki Y Fk to Y Rki Y£ k such that the inflating effect of 
the group differences in target ability is eliminated. The manner this is accomplished is to 
construct Y Rk and Y Fk so that they are estimating the studied subtest response functions 
T R {6) and T F {6) at approximately the same target ability B k defined below (as opposed 
to two different ones, as is evident from (17)). 

A natural attempt to make adjustments to Y Rk and Y Fk is to approximate T R (6) and 
T F {9) in the neighborhood of 9 Rk and 9 Fk by linear functions. If we assume that 6 Rk and 
6 Fk are sufficiently close together to do this, T R (9) and T F (Q) can be linearly interpolated 

at d k = 2^ Rk + d Fk) : 

T g (6 k ) = T g (6 gk ) + m gk (6 k -6 gk ) 

where 

m r,p,, t + 1 )-r < tf„»_ 1 ) i 

m gk - 7 a ' 

however, though estimates of T g (6 gk ) (namely, Y gk ) ars available for all fc, estimates for 
{8 gk : k = 0,. .. ,n} are not. Abilities on the 0-scale are not observable; however, one can 
estimate abilities on the scale defined by the valid subtest, namely 

v = P{6) 

where P{6) is the average of the valid subtest IRFs £ P(0) | G = g is the 

true score for a randomly chosen group g examinee, i.e., the valid subtest true score P(0) 
for group g. Let 

V 9 {*) = E[P{G)\X-x % Gmg\ % (20) 

the (theoretical) regresion of true on observed (here, valid) score. V g (x) can be easily 
estimated using classical true score theory, assuming that the above regression is linear or 
nearly so. The estimation of V g (x) is deferred to the appendix. Denote this estimator by 
V g (x). 

At this point it is expedient to describe three latent scales, which must be simulta- 
neously considered in order to understand the correction. Figure 3 delineates the three 
scales and should be referred to frequently. 

12 



(19) 



0 

ERIC 



25 



1 



place figure 3 about here 



So, the interpolation of (19) must be transformed so as to use the easily estimable 
V g {k) instead of 6 gk . Through a monotonic transformation P(0), V g {k) and 9 gk represent 
approximately ("approximately" because P(9 gk ) = V g {k) will be demonstrated below) 
the .same ability on two different latent scales and thus for our purposes interchangeable. 
Note that s — T g (B) defines a monotonic transformation from the fundamental latent 
scale to the studied subtest scale, and v — P(9) defines one from the fundamental scale 
to the valid subtest scale. T g {9) must be transformed so we can use the valid subtest 
scale as domain, because abilities on this scale can be estimated. Figure 4 illustrates the 
appropriate correspondence, 



place figure 4 about here 



thus defining a new transformation S g {y) — T ff (P -1 (v)) from valid subtest scale to studied 
subtest scale, with domain (c, 1) and range (c, 1) (c > 0 is the guessing parameter, assumed 
common for all items in the test). 

With this transformation in hand, the correction can be performed in the following 
manner. First, by the same arguments as used in (10) and (11), using P(9) in place of 
T g (6) in the arugments, 

V g (k)±P(E[Q\k i9 ]) = P(9 gk ). (21) 
So P- l (V g (k)) = 6 gk by continuity; and 

T 9 (P-\V 9 (h)))±T g {0 gk \ 
13 

erJc * G 



also by continuity. By definition of S g (v), this becomes S g (V g (k)) == T g (6 gk ) t and thus 
by (17), 

Ef, t = S,(V f (*)). (22) 

Thus Y gk is a reasonable estimation of S g (V g (k)) for each k. To transform (19) into 
an interpolation involving S y (0> we assume that S g (v) can be approximated by a linear 
function in a small region about V g (k) } and that V R (k) and V F (k) axe close enough to 
allow the approximation to be effective. Then, we interpolate S R (V R (k)) and S F (V F (k)) 
to their respective values at V* = f^rt^) + ^ F (k))\ 

S g (V k ) ± S f (V f (*)) + m* gk (V k - (23) 

where 

m y* v f (* + l)-V f (*-l) 

is the approximate slope of S g (v) in the region of V g (k) and V*. All of the above terms on 
the right hand side of (23) are estimable; using Y gk to estimate S g (V g (k)), we define the 
adjusted Y* k \ 

Y g \ = Y gk +M gk (V k -V g (k)) (24) 
where, recalling that the estimator V g (x) is given in the Appendix, 

'* V,(k + 1) - V,(k - 1) 

and define V fc = | (V^(fc) + V F (k)). Because the right hand side of equation (24) is a good 
estimator of the right hand side of (23), Y gk is thus a good estimator of S g ( V k ). Finally, Y gk 
must be shown to be a good estimator of T g {9) at the same 6 for both groups. By definition 
of S g {v), S g (V k ) = T g (P^(y k )). If 6 Rk and 6 Fk are sufficiently close together then P(6) 
may be taken to be approximately linear in the neighborhood of 9 k — {B Rk -\-6 Fk )/2. Thus, 
using (21) and assuming approximate linearity of P in the neighborhood of 6 k , 

V k = \(V R (k) + V F (k)) 

±\(P(0Rk) + PV F k)) 

=m). 

Thus, by the continuity of P(9), 

8 k = P- } (V k ). 



14 

17 



Hence, by the definition of S g (v) 

5 5 (n) = r fl (p- 1 (v fc )) = T fl (^). 

Thus, because Y gk has been shown to be a good estimator of S g (V k ), it is shown that 
Y* k is a good estimator of T g (9 k ). Thus, Yfi k - Y£ k , as desired, is a good estimator of 
T R (9 k ) — T F (9 k ), i.e., of the difference of the marginal IRFs at the same 9, establishing 
the usefulness of the interpolation (19). 

(24) is called the regression correction for target ability difference. Thus, with the 
correction (24) in place, (13) can be reconstructed, with 

and B defined as in (15). Rejection of the hypothesis of no test bias (H : fly = 0) occurs 
when B > z al where P[A r (0, 1) > z Q ] = a defines z a . This procedure will be referred to 
as the SIB procedure, "SIB" for simultaneous item bias. 

Thus, the contribution to the differences Y Rk — Y Fk due to target ability difference 
has been eliminated. It is extremely instructive to note that the correction (24) is the 
sample analogue of (23), which is basically the decomposition (19), albeit on a different 
latent scale (though the two latent scales, S and V, are indistinguishable up to a monotonic 
tranformation). 

A modification of the basic procedure to achieve better statistical behavior 

Redefine p k to be the proportion of all examinees (focal and reference group) attaining 
X = k. That is p k = (J Fk + Jru)I £*=o(^f* + ^Rk)* Substitute this new p k into (25) 
and (16) to obtain the statistic B of (15). Because of a slightly better adherence in 
simulation studies to the nominal level of significance when the hypothesis of no test bias 
holds, this new choice of p k is recommended over the slightly more intuitive choice based 
upon focal group examinees alone. The power performance of both versions of B when 
test bias was present was very similar. It is upon this version of the SIB statistic that our 
simulation studies reported below are based. 

SIMULATION STUDY 

In order to assess the performance of the procedure in a variety of testing situations, 
a moderate-sized (84 simulation cases) simulation study was performed. Three parameter 
logistic item parameters actually estimated from two test data sets, an ACT math test 
(estimated by Drasgow (1987)) and an ASVAB auto shop test (estimated by Mislevy and 
Bock (1984)), are used to specify the IRFs in the IRT model. Univariate and bivariate 

15 18 



normal ability distributions, appropriately centered relative to the test item parameters 
(for the purpose of good measurability of target ability), are used for the focal and reference 
groups. Two levels of bias and three levels of target ability difference are simulated; tests 
with a singly-based item and with three biased items are used in the simulations. The level 
of guessing in the tests is varied. Finally, group size pairs of (3000, 3000), (3000, 1000), 
and (1500, 1500) for the reference group and focal group examinees respectively are used. 

Each simulation model is run 100 times (trials). For a particular simulation model, the 
item parameters and the two ability distributions for the two groups are fixed; however, 
at each trial, a new set of examinees (ability parameters) is generated from the ability 
distributions. 

When a single item is to be studied in a simulation, the Mantel-Haenszel procedure as 
modified by Holland and Thayer is run in parallel in order to provide an external reference 
to compare to and to compare our procedure with. 



Item parameters 

Estimated item parameters from the above mentioned tests were used to construct test 
models; the ASVAB test length is 25, and the ACT test length is 40. Table 1 gives the sum- 
mary statistics for the a's, b's, and c's as estimated by Mislevy and Bock and by Drasgow; 
for the actual parameter values, see Mislevy and Bock (1984) and Drasgow (1987). 



place table 1 here 



The test for each simulation was generated in the following manner. Let N denote 
test length and n b the number of items io be studied for possible bias. First, n b was chosen 
to be either 1 or 3. There were two cases to consider. 

1. No bias: unidimensional items are used for the entire test. 

2. Bias: unidimensional items are used in the valid subtest, and 2-dimensional items are 
used in the studied subtest. 



16 

19 



place table 2 about here 



In the first case, n b of the N items were chosen randomly to be the studied ones, and 
the remainder were used as the valid subtest. In the second case, n = N — n b items were 
chosen at random from either the ASVAB or the ACT test to be the valid subtest, and 
the 2-dimensional studied item parameters were chosen according to Table 2. Note that 
the studied item guessing parameters are a function of the average and standard deviation 
of the guessing parameters on the ASVAB "or ACTTests; the studied item a's and b's are 
the same for both tests. 

The IRFs are for case 1 (no bias) 

m-«+ T+sa$A =m i = h -' N ' (26) 

where a i9 and b ie are the target discrimination and difficulty for item t. In case 2 (bias), 
items 1 to n were of the form (26), and items n + 1 to N (studied items) had IRFs 

W.*)-«*+7T ( i n< /I " k\ _l ta u m * = " + % N. (27) 

l + exv{-lJ(a ie (6-b i0 ) + ai^e-b ifl ))) 

The final factor in determining the item parameters wets whether or not to include guessing; 
that is, whether to assume 2PL or 3PL modeling. The presence of guessing is thought 
to influence the performance of the procedure. Thus, in some simulation models, the 
estimated c t -'s from the literature were used in conjunction with (26) and (27); in others 
all c^s were set to 0 producing a 2PL model. A detailed description of the experimental 
design of the simulations follows. 

Ability distributions 

Specifying the ability distributions involves choosing the five parameters determining 
the bivariate normal distributions for each group in such a way to meet the following goals: 

1. Introduce a specified amount of group difference between target ability distributions. 

2. Require the test to measure the target ability well, as would be true for any "good" 
test. 

3. Introduce a specified amount of potential for bias into the distributions. 

4. In the case of 2-dimensional studied items (bias case), require that examinee nuisance 
abilities be influential in determining the response to the item, e.g., that target and 
reference group examinees have moderate nuisance abilities. 

17 

ERIC £ 0 



Each goal is elaborated upon separately below. The bivariate distributions for group g 
(g = R or F) is denoted 

Ri:)-»IC:Mi ;]] 

where p = Corr(0, rj \ G = g) is taken to be the same for both groups (p taken to be 
different across group tends to introduce bidirectional bias, where marginal IRFs in 6 for 
the two groups cross; see Shealy (1989)), Note that a 2 (0 | g) and o 2 {rj | g) are taken to 
be 1 in our study. 

Goal 1. We first define target ability difference. We need some notation; let a R = 
the proportion of the entire (conceptual) population of examinees who are referece group 
members, and a F » 1 — be the corresponding proportion for the focal group* (Note: 
as J R and J F both increase to oo, conceptually, j^j f — > <* R and j/+j f — > <*f« Here J ff 
denotes the number of sampled Group g examinees*) Define 

d T = VeRZJhF (29) 

to be the target ability difference between the focal and reference groups, where 

a] P = a R a 2 (Q \ R) + a F a 2 (Q | F). (30) 

Note that when (28) holds a\ p = 1 and thus that d T = p, 6R — tx eF . d T is a quantity 
specified in the simulations. 

Goal 2. The criterion used to ensure good measurability of 6 by the test, is that the 
average difficulty (6) of the valid subtest should be close to the average target ability over 
the pooled groups. Specifically, fi eR and n 6F are chosen so that 

6 = E[6] = a R n eR + a F fi eF . (31) 

6 is taken from Table 1. ii 6R and y. eF are completely determined by specification of d T 
and (31). 

Goal 3. We use a more restrictive version of Definition 1 to define potential for bias: set 

C 0 {6) - E[ff | 0 = 9 % G m R] - E[rj | 0 = 9,G - F). (32) 

CfiW > 0 * s defined to be the potential for bias against the focal group. When (28) holds, 
(32) becomes 

C p {9) = Cp= n nR - p\i 9R - (n nF - pn 9F ) 

= (t* v R ~ H n f) ~ P&6R ~ VBF) = (PyR ~ ^f) ~ P d T> 

18 

* 21 

ERIC A 



$ dropping out because the ability correlation (p) is equal for both groups. Note that 
because Cp is constant for all 0, unidirectional bias is being introduced. For a specified 
amount of /i,^ and ii n p are determined partially. The reader should note that potential 
for bias can hold even though ii nR = p^p unless n 6F = n 9R . 

Goal 4. The criterion used to ensure nuisance determinant influence is the following. The 
nuisance difficulties for all studied items were chosen to be 0. For an arbitrarily chosen 
target ability (say 0 = 0) we thus want the average nuisance ability to be near 0 as well. 
Thus we choose 

E[rj\Q = Q,G = R} = | 0 = 0,G = F] (34) 

i.e., the conditional nuisance expectation at 0 = 0 is to. be. centered around the average 
studied item nuisance difficulty of 0, for the reference and focal groups. Our intent in this 
study was to introduce bias against the focal group, so E[rj | 0, R] > 0 in (34) and thus we 
get 

0 < A*t,h ~ PPor = ~0V ~ PP9f)\ ( 35 ) 

this will specify n^ R and n nF , along with specification of Cp in (33). 

There is an additional issue here: how large should Cp be chosen to introduce a 
"moderate" or "severe" amount of bias into the 2-dimensonal studied items of Table 2? 
This is treated below, in the experimental design of the study. 

Goals 1-4 now completely specify (28): ti 0R) n eFi n nRi and n nF can be found by 
solving (29), (31), (33), and (35) simultaneously for them. />, a 2 {B \ g), and a 2 (r) \ g) are 
chosen: p = .5, and all a's are 1. 

Choice of C ? 

The amount of potential for bias Cp in each simulation model was chosen so that the 
actual level of bias /9y produced was such that the power behavior of the statistic can be 
well assessed for the given examinee sample sizes, valid subtest used (recall Table 1), and 
biased items used (recall Table 2). These /3 V values (rounded to two significant figures) 
are shown in Table 3. The governing equations determining Cp from /9y were 

where 

T,(0)= £ £[P,(0,»?) I 0 = *, = (36) 
with Pi(9,ri) defined in (27) and the item parameters in (27) defined in Table 2, and the 



19 



place table 3 about here 



parameters of the (©,77) distribution determined from (29), (31), (33), and (35). One 
standard often used to interpret from a practitioner's viewpoint the magnitude of the bias 
is that the bias is "moderate" if 0.5 < A ww < 1 while it is "large" \i A MfJ > 1, where 
A Af H is the theoretical index based on use of the Mantel-Haenszel log odds ratio proposed 
by Holland and Thayer (1988). The rationale for A MH and j3y are different, but for n b = 1 
and unidirectional bias, they tend to be highly correlated and are crudely related by 

Thus, roughly, 0.05 < fly < 0.1 would constitute moderate bias while fly > 0.1 would 
constitute large bias. Thus in the n b = 1 case, referring to Table 4, the amount of bias 
being simulated is actually either (low) moderate or small. Examination of (36) shows that 
is a measure of how much lower the probability of getting the biased item right is for 
an average focal group examinee as compared with an average reference group examinee 
of the same target ability. Thus fly has a natural and useful empirical interpretation. In 
our context, A A by contrast, is a measure of horizontal distance between T R {9) and 
T F {B) at y = if* (i.e., the value of T^((l + c)/2) - T^((l + c)/2)), where c is defined 
in Table 1. 



place table 4 about here 



Experimental design 

The design is as follows. For the case of no test bias (C^ = 0), for each test type 



(ASVAB Auto Shop or ACT Math) the following simulations are done: 



r 0.0 > 




' 3000/3000 ) 


i 0.5 


i x J r /Jf — < 


3000/1000 > 


1 1.0 J 




k 1500/1500 J 



= { I } x d T - 

| guessing 1 
1 no guessing J ' 

Here "guessing" means that the estimated ACT and ASVAB guessing parameters are used 
in the model and "no guessing" means that all cs are set to zero; that is, 2PL modeling 
is used. Also, "D" means that this guessing "factor" is randomly assigned within the 
36 levels produced by crossing the other factors. 

For the case of test bias (C^ > 0) the following simulation are done for each test type: 

, v f . r o r 3000/3000 

^-{i}^-{S} xC '-{S}*^-{s^ 



f guessing ) 
\ no guessing J 



For n b = 1, the nuisance discrimination a NlJ of the studied item is .8; for n b = 3, the 
nuisance discrimination of each of the 3 studied items is .4. These discriminations were 
chosen so that the power of the procedure could be well assessed (i.e., so that it would not 
be too close to 1). It is informative to note in passing that the power of the procedure 
is expected to be greater when n b is increased from 1 to 3 unless each item individually 
displays less bias in the n b = 3 case. This is why the a ir} (i = N - 2, N - 1, N) was chosen 
to be .4 in the n 6 = 3 case, £ of that used in the n b — 1 case. 

There are therefore 48 simulation models that incorporate bias. Thus, a total of 
84 simulation models were used in the simulation study. 

RESULTS OF THE SIMULATION STUDY 

The results of the simulation stidy are given in Tables 5-8 and 9-12, with Tables 5-8 
summarizing the no test bias simulations and Tables 9-12 summarizing the simulations 
having test bias present. The c column indicates whether the model has guessing present 
or not. In all n b = 1 case* the Mantel- Haenszel rejection rate for the hypothesis of no item 
bias (based on 100 trials) is reported in the MH column. In all cases the SIB rejection rate 
is reported in the SIB column. In all cases where test bias is present (Tables 9-12), the 
C,} column presents the amount of potential for bias present (recall (33)); the fly column 
presents our index of the amount of bias present against the focal group in the model 



21 O a 
**** 



(recall (6)); /?y is the average of the estimates 0 V of py over the 100 trials; the A MW 
column presents the amount of bias present against the focal group in the model from the 
Mantel-Haenszel perspective. 

Tables 5-8 indicate that both the SIB statistic and the MH statistic display reasonable 
adherence to the nominal level of significance of 0.05. There appear to be situations of 
no bias, which have a target ability difference and which depart from the Rasch model, 
where the Mantel-Haenszel procedure displays inflated Type 1 error. (See Zwick (1990), 
for a discussion of this problem and an illustrative example.) There is evidence that 
in such situations (Shealy (1989)), the SIB statistic adheres closely to the nominal level 
of significance. On the other hand there are likely portions of the "parameter space" 
of realistic IRT models where our linear regression correction is stressed and hence the 
MH would likely display better Type 1 error performance. More study is required before 
it can be claimed that either MH or SIB displays superior Type 1 error performance. 
The striking fact is that both procedures seem to be quite robust against the inflating 
Type 1 error effect of differing target ability distributions. In this regard, d T = 1 from the 
practitioner s perspective is certainly a large amount of target ability difference. 

Tables 9 and 11 indicate that both the SIB statistic and the MH statistic are quite 
powerful against moderate amounts of bias and fairly powerful against small amounts of 
bias in a single biased item. Untabulated simulation studies for larger amounts of bias 
produced rejection rates of essentially unity for both the SIB and MH procedures. 

Tables 10 and 12 indicate that the SIB procedure is quite powerful against moderate 
amounts of bias resulting from several (3 here) items producing bias in the same direction. 
The reader should recall that the amount of bias/item was lowered for the n 6 = 3 case by 
reducing the discrimination in the nuisance dimension from a nN = 0.8 to a vi = 0.4 for the 
studied items. In both the n b = 1 and n b = 3 cases, the potential for bias as measured 
by C$ was kept the same (Cg = 0.2 or 0.3). These two table show, as claimed, that the 
SIB procedure can successfully detect simultaneous item bias, even if the amount of bias 
present per item is small. 

Tables 9 and 11 show, for the particular bias models of the simulation study, that SIB 
is somewhat more powerful than MH, averaging 0.07 higher for those models for which 
rejection rates are < 0.9. We do not know whether this greater SIB power generalizes to 
other models of bias. 

Tables 9-12 provide evidence about the ability of 0 V to estimate our measure of 
the amount of bias present. For each case fiy is an indicator of the amount of statistical 
bias one might expect in using fly. Clearly statistical bias of roughly +0.01 is present. 
The estimated standard errors for P v are not recorded, but averaged (roughly) about 1/3 
of fly. Thus if py = 0.05 there is likely a bias of 0.01 and a standard error of 0.017. Thus, 
crudely, a 95% confidence interval (if asymptotic normality is a good approximation) would 

• -22 



be given by 0.04 ± 0.028. Here 0.04 = 0.05 - 0.01 is the correction for statistical bias. It 
would seem that $u provides a useful empirical index of the amount of bias present in a 
statistical subtest of items; more work is planned in studying its theoretical and empirical 
properties. 

SUMMARY AND CONCLUSIONS 

The SIB procedure was designed to test for unidirectional test bias residing in one or 
more items, using the conception that test bias is incipient within the two groups' ability 
distributions (in terms of a difference in conditional nuisance ability distributions). By 
means of the regression correction presented here, the inflation of the SIB test statistic 
due to target ability difference (one group having a stochastically larger distribution of 0) 
is extracted. This correction represents a conceptual link between conditional-on- observed- 
score methods and IRT-based methods, just as the practice of including the studied item 
in the comparable examinee criterion in the Mantel-Haenszel procedure of Holland and 
Thayer (19S8) does. The correction adjusts the studied subtest scores for the two groups so 
that they are now estimates of the same latent IRT ability in the case of no test bias, even if 
group target abilities exist. It is useful to note that the adjustment, although conceptually 
based upon multidimensional IRT modeling, is in fact computed using a classical approach 
and hence does not depend on IRT ability or item parameter estimation. 

A moderate (84 models) simulation study shows that both MH and SIB display good 
adherence to the nominal level of significance, even for large (d T = 1) target ability differ- 
ences. In the case of a single biased item, both MH and SIB display good power with SIB 
displaying slightly higher power. As designed, the SIB statistic displays good power in the 
case of several biased items (3 here), even when the amount of bias/item is fairly small. 

A large scale simulation study is in progress with the goal of obtaining a better un- 
derstanding of the performance characteristics of both the SIB and the MH statistics with 
particular emphasis on investigation of statistical power and adherence to the nominal 
level of significance. Based upon the completed portion of this simulation study reported 
herein, we would recommend that practitioners use the SIB and MH statistics simultane- 
ously. Both are extremely easy to compute and for moderate sized data sets run quickly on 
a typical PC configuration. Carefully checked code with a user oriented driver is available 
from the authors for running both the SIE and MH statistics on real data sets and also 
for doing simulation studies of performance. 



&6 



APPENDIX 

1. Derivation of the estimated regression of true on observed valid 

subtest score, for k — 0,. . . ,n. 

Recall that V g (k) = E[P(0) \ k,g] needs to be estimated in order for S g {V k ) of (23) 
to be estimated. Suppressing g for simplicity, we need to estimate V(k) at k = 0, 1, . . . ,n. 
Although V(k) is not necessarily linear in k (see Shealy (1989), p. 87ff for a discussion), 
as an approximation we assume nV{k) is linear in k\ i.e., 

nV(k) = a + 0k. 

To estimate V(fc), we consider the true score model for the valid subtest score X: 

X=T + e (Al) 

where 

E(e) = 0, cov(T, e) = 0 ( A2) 

is assumed and the true score T has the latent variable representation T — nP(0). Thus 



nV(k) = E[T | *]. 



Standard regression theory for E(T | k) yields 



V(k) = ^(ET+ E*pr(k - EX)^ . 



(A3) 



But, for the true score model given by (Al) and (A2), 

Pxt u t _ i _ g2 ( e ) 

a x o\xy 



(A4) 



is well known (see page 61 of Lord and Novick (1968). Using (Al) and (A2), ET = EX 
holds. Thus, by (A3) and (A4), 



V{k) = i 
n 



(A5) 



holds. 

Clearly EX = f^f-X" | ^] can be estimated by the average valid subtest score X g 
of all Group g examinees taking the test. Thus it remains to estimate o 2 (e)/o 2 (X). 



24 



a 2 (X) s a 2 {X | g) can clearly be estimated by the usual sample variance estimate of all 
Group g examinees taking the test 

i\X I ff) = p^i) - X,f , (A6) 

where 7 ff denotes the number of Group # examinees taking the test and X g j is the valid 
subtest number correct score of the jth such Group g examinee. It remains to estimate 
a 2 (e); denote this estimation by a 2 (e). Then the desired estimation of a 2 (e)/<r 2 (X) will be 
given by a 2 (e)/a 2 (X). A standard conditioning formula yields, indexing the valid subtest 
items by i = 1, 2, . . . , n, and setting X g = X | g, Q g - 0 | g as a reminder that sampling 
here is from Group g only, 

a 2 (X | g) = a 2 (X s ) = a 2 (£[* s | 0,]) + £[a 2 (X, | 0,)] 

= a 2 (nP(0 s )) + ^ f;[P,(0 s )(l - P,(0 S ))], 
i=i 

using the standard item response theory assumption of local independence of items, given 0. 
Also, by (A2) it is trivial that 

a 2 (X\g) = <7 2 (nP(G)\g) + o 2 (e\g). 

Thus, by (A7), 

a 2 (e\g) = jTElP i (Q 9 )(l-P i (O g ))}. 
i=i 

This suggests 

»*(« I •) - XX* 1 " ff n>« (A8 > 

1=1 

where U i is the proportion correct for Group g examinees for valid subtest item i. Thus, 
using (A5), we will estimate VAk) by 



(i ^iilii Vt X ) 
[} d>(X\g)) {k X °\ 



(A9) 



2. The complete procedure to detect test bias, using the proposed regres- 
sion correction. 

The SIB procedure in its entirety is presented here. First we set some basic notation. 
Group g (g = Rot F) has J g examinees taking the test of N items. The response to item i 
of the jth group g examinee is U gij . The subtest scores are 

n N 
X gj = ^ U gij ( valid subtest score )> Y gj = U M ( studied subtest score )- 

1=1 i'=n+l 

25 

er|c 28 



The classical group item difficulties are U gi = (l/*/ y ) H;=i Ugij- Let denote summa- 
tion over those group g examinees j with k correct on the valid subtest. 

1. Compute J gk , the number of group g examinees with k correct on the valid subtest. 

2. Compute 




If J gk = 0, set Y gk = 0; if J gk < 1, set S gk = 0. Y gk is the sample average studied 
subtest score of group g examinees attaining X g = fc, and S gk is the sample variance. 

3. Compute P g (k) = J gk / J gy for both groups and all k. P g (k) is the estimate of the his- 
togram of X | G = g. Then compute P g (k), the MLE of the unimodalized histogram 
of X | G = g, over the class of all possible unimodal MLE of the histograms with n + 1 
possible values (X \ G = g is assumed to have a unimodal distribution and hence its 
estimate {<P/(&), k > 0} should also be unimodal). For details of this procedure, using 
the up-and-down-blocks algorithm, see Barlow et al. (1972; pp. 72-73; pp. 223-231). 

4. Set I(k) = 1 for all k unless either 

(a) k = 0 or n, 

(b) S% k = 0 or S* Fk = 0, 

(c) J R Px(k) < J min or J F Pp(k) < J min where J min is set by user, usually around 30, 
or 

(d) k < ncy, where Cy > 0 is the user-specified global guessing parameter for the 
test. (It is assumed that there is a relatively constant level of guessing across 
item, and that there is at least partial knowledge of this guessing value.) 

J(fc), k = 0, ... ,n, is the examinee inclusion indicator; it is 1 if examinees with 
X = k are to have their responses included in the test statistic, (a) excludes the two 
extreme valid subtest scores because of their poor estimation of target ability. The 
(b) exclusion is obvious. The (c) exclusion is done to assure that each valid subtest 
score category has enough examinees to make Y Rk and Y Fk approximately normal; the 
unimodal mass function is used so that only extreme valid subtest score catagories are 
excluded. As for (d), all valid scores below that expected by guessing are excluded. 

5. Compute the regression of true score on valid subtest score: 

(a) Ugi = ley 17 * K the result is < 0, set it to 0 (adjustment for guessing). 

(b) x g = £ Efc x 9i 

(c) | = ^ 

(d) a 2 (e | g) = £ti " %) 

\ Q ) °9 - n-1 V 1 oHX\g)J 

26 

ERIC 



(f) V g {k) - ±(X g + b g {k- X g )) for both g and k = 0, . . . , n. 
6. Make the regression correction: 

(a) fc| = min{fc : J(fc) = 1}, k r = max{fc : I(k) = 1}. 

(b) V, = + VV(fc)), ^ < * < V 

(c) For k t < k < fc r , compute 



Then compute Y* k = f ffJfc + - V g (k)). 

(d) For fc = A:^ and k = fc r , compute y s * fc in-the following way. 
i. Define 

' (1 - a)? tMl + «? gk if fyfc) < v < V g {k + 1) 

if v>t>(n), 

and 



a = 



+ 1) " V 8 (k) 

S g (v) is the linear interpolation of {^ 0 » • ■ • » Ygn)- 
ii. Compute 

for k — k e and = k r . 
7. Compute the bias statistic. 

(a) Compute J* = Yik=o HWgk* *ke numDer °f included group y examinees 

(b) Compute 

(£Z=.^r(s*«+«*))' /2 

(c) Reject ff : ft, = 0 in favor of ft, > 0 at level a if B > z a , where P[JV(0,1) > 
2 0 ] = a defines z a . 



I 

27 30 



References 



Ackerman, T. (1991). A didactic explanation of item bias, item impact, and item validity 
from a multidimensional IRT perspective. Submitted for publication and presented 
at 1991 annual AERA/NCME joint meeting. 

Ansley, T.N. and Forsyth, R.A. (1985). An examination of the characteristics of uni- 
dimensional IRT parameter estimates derived from two-dimensional data. Applied 
Psychological Measurement 9, 37-48. 

Barlow, R., Bartholomew, D., Bremmer, J., and Brunk, H. (1972). Statistical Inference 
under Order Restrictions. New York: John Wiley. 

Drasgow, F. (1987). A study of measurement bias of two standard psychological tests. 
Journal of Applied Psychology 72, 19-30. 

Hambleton, R.K. and Swaminanthan, H. (1985). Item Response Theory: Principles and 
Applications. Boston: Kluwer-Nijhoff Publishing. 

Holland, P.W. and Thayer, D.T. (1988). Differential item functioning and the Mantel- 
Haenszel procedure. In H. Wainer and H.I. Braun (Eds.), Test Validity, (pp. 129-145). 
Hillsdale, New Jersey: Lawrence Erlbaum. 

Kok, F. (1988). Item Bias and Test Multidimensional! ty. In R. Langeheine and J. Rost 
(Eds.), Latent Trait and Latent Models, (pp. 263-275). New York: Plenum Press. 

Lautenschlager, G. and Park, D. (1988) IRT item bias detection procedures: issues of 
model mis-specification, robustness, and parameter linking. Applied Psychological 
Measurement 12, 365-376. 

Linn, R.L. and Harnish, D. (1981). Interactions between item content and group member- 
ship on achievement test items. Journal of Educational Measurement 18, 109-118. 

Linn, R., Levine, M., Hastings, C, and Wardrop, J. (1981). Item bias on a test of reading 
comprehension. Applied Psychological Measurement 5, 159-173. 

Lord, F.M. (1980). Applications of Item Response Theory to Practical Testing Problems. 
Hillsdale, New Jersey: Lawrence Erlbaum. 

Lord, F.M. and Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Reading, 
Massachusetts: Addison-Wesley. 

Mellenbergh, G.J. (1882). Contingency table methods for assessing item bias. Journal of 
Educational Statistics 7, 105-118. 

Millsap, R.E. and Meredith, W. (1989). The Detection of DIF: Why There is No Free 
Lunch. Paper presented at the Annual Meeting of the Psychometric Society, Univer- 
sity of California at Los Angeles, July 6-9, 1989. 

Mislevy, R.J. and Bock, R.D. (1984). Item operating characteristics of the Armed Services 
Aptitude Battery (ASVAB). Form 8A. Office of Naval Research Technical Report 
(N00014-83-C-0283). 

Shealy, R.T. (1991). Assessment of the Shealy-Stout test bias statistic: a simulation study. 
In preparation. 



28 

31 



Shealy, R.T. (1989). An Item Response Theory-Based Statistical Procedure for Detecting 
Concurrent Internal Bias in Ability Tests. Unpublished doctoral dissertation, Univer- 
sity of Illinois, Urbana-Champaign. 

Shealy, R.T. and Stout, W.F. (1991). An Item Response Theory Model for Test Bias 
(Technical Report 4421-548 under ONR grant N00014-90-J-1940). Champaign, Ur- 
bana: Department of Statistics, University of Illinois (A 1989 version of this was widely 
distributed; it will appear, by invitation, in Differential Item Functioning, Theory and 
Practice, 1992, Hillsdale, New Jersey: Erlbaum.) 

Thissen, D., Steinberg, L., and Wainer, H. (1988). Use of item response theory in the 
study of group differences in trace lines. In H. Wainer and H.I. Braun (Eds.), Test 
Validity (pp. 147-169). Hillsdale, New Jersey: Lawrence Erlbaum. 

Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of 
differential item functioning coincide? Journal of Educational Statistics 15, 185-197. 



/ 



P[v a <v\o] 



P[v 9 <v\o) 




stochastically ordered n ot stochastically ordered 

Figure 1. Stochastically ordered and unordered pairs of distributions 



33 



Table 1: Means and sds for the ASBAB and ACT item parameters used in the study. 



Test 


a 




I 


<*\ 


2 




N 


ASVAB auto/shop 


1.22 


0.7 


0.09 


0.72 


0.20 


0.06 


25 


ACT math 


1.09 


0.35 


0.5 


0.61 


0.14 


0-04 


40 



Table 2: Item parameters for 2- dimensional, studied in the bias case. 





Item No. 




b it 








1 


N 


1.0 


0.0 


O.S 


0.0 


c 


3 


N-2 


0.6 


-0.3 


0.4 


0.0 


c- \cr e 




;V-1 


O.S 


0.0 


0.4 


0.0 


c 






1.0 


0.3 


0.4 


0.0 





Table 3: Equivalence table for bias potential and actual test bias. 



77 6 


c$ 


a i 


0u 


1 


0.0 




0 


1 


0.2 


0.8 


0.03 


1 


0.3 


O.S 


0.05 


3 


0.0 




0 


3 


0.2 


0.4 


0.06 


3 


0.3 


0.4 


0.09 



Table 4: Equivalence of Amh and when nt = 1, using item parameters of Table 2. 



c» 


c's used 






0.0 




0 


0 


0.2 


0.0 


.27 


0.034 


0.2 


actual c's 


.27 


0.026 


0.3 


0.0 


.40 


0.051 


0.3 


actual c's 


.39 


0.039 



Table 5: No bias, ACT, = 1, a = 0.05. 



Jf 


Jr 


c 


• 


MH 


SIB 


1500 


1500 


0 


.0 


.03 


.07 


1000 


3000 


0 


.0 


.00 


.02 


3000 


3000 


c 


.0 


.09 


.06 


loOO 


1500 


0 


.0 


.04 


.04 


1000 


3000 


c 


.5 


.10 


.10 


3000 


3000 


c 


.5 


.05 


.03 


1500 


1500 


c 


1.0 


.02 


.05 


1000 


3000 


c 


1.0 


.05 


.10 


3000 


3000 


0 


1.0 


.06 


.09 



Table 6: No bias, ACT, n h = 3, a = 0.05. 



Jf 


Jr 


c 


df 


SIB 


1500 


1500 


0 


.0 


.05 


1000 


3000 


0 


.0 


.02 


3000 


3000 


c 


.0 


.07 


1500 


1500 


0 


.5 


.OS 


1000 


3000 


c 


.5 


.07 


3000 


3000 


0 


.5 


.05 


1500 


1500 


c 


1.0 


.06 


1000 


3000 


c 


1.0 


.16 


3000 


3000 


0 


1.0 


.09 



Table 7: No bias, ASVAB, n t = 1, a= 0.05. 



Jf 


J* 


c 


df 


MH 


SIB 


1500 


1500 


0 


.0 


.OS 


.07 


1000 


3000 


0 


.0 


.04 


.04 


3000 


3000 


c 


.0 


.06 


.06 


1500 


1500 


0 


.5 


.13 


.14 


1000 


3000 


c 


.5 


.04 


.03 


3000 


3000 


c 


.5 


.05 


.04 


1500 


1500 


c 


1.0 


.07 


.02 


1000 


3000 


c 


1.0 


.15 


.09 


3000 


3000 


0 


1.0 


.11 


.01 



36 



Table S: No bias, ASVAB, «* = 3, a = 0.05. 



Jf 


Jr 


c 


J 




n <e A A 

1500 


1500 


A 

u 


n 


•Uf 


i AAA 

1000 


onnn 
oUUU 


n 
u 


n 


OA 
• U* 


A AAA 

3000 


onnn 
oUUU 


c 


n 

• U 


.uo 


1 CAA 

1500 


i *nn 


n 
U 


e 

.0 


• U f 


i nnn 

1UUU 


3000 

VVvv 


#; 

w 


•5 


.06 


3000 


3000 


0 


.5 


•05 


1500 


1500 


e 


1.0 


•15 


1000 


3000 


c 


1.0 


•07 


3000 


3000 


0 


1.0 


•04 



Table 9: Bias, a, = 0.8, ACT, n h m 1, a = 0.05. 





J* 


e 


dj 


c$ 


fiu 


ft* 




MH 


SIB 


1500 


1500 


e 


0 


.2 


.026 


.032 


.27 


.46 


.58 


1000 


3000 


0 


0 


.2 


.032 


.042 


.27 


.64 


.70 


3000 


3000 


0 


0 


.2 


.032 


.035 


.27 


.91 


.95 


1500 


1500 


e 


.5 


.2 


.029 


.035 


.27 


.51 


.60 


1000 


3000 


0 


.5 


.2 


.034 


.044 


.27 


.65 


.72 


3000 


3000 


0 


.5 


.2 


.034 


.038 


.27 


.91 


.94 


1500 


1500 


0 


0 


.3 


.048 


.052 


.40 


.84 


.90 


1000 


3000 


e 


0 


.3 


.042 


.053 


.40 


.S7 


.91 


3000 


3000 


c 


0 


.3 


.042 


.045 


.40 


.97 


1.00 


1500 


1500 


0 


.5 


.3 


.050 


.047 


.40 


.99 


.99 


1000 


3000 


e 


.5 


.3 


.042 


.054 


.40 


.SO 


.84 


3000 


3000 


e 


.5 


.3 


.042 


.064 


.40 


.91 


.92 



Table 10: Bias, a, = 0.4, ACT, r\\, = 3, a = 0.05. 



Jf 


Jr 


c 


dj 


c$ 


fiu 


fiu 


SIB 


1500 


1500 


0 


0 


.2 


.063 


.069 


,70 


1000 


3000 


c 


0 


.2 


.053 


.067 


.68 


3000 


3000 


c 


0 


.2 


.053 


.053 


.SO 


1500 


1500 


c 


.5 


.2 


.055 


.071 


.60 


1000 


3000 


0 


.5 


.2 


.065 


.083 


.72 


3000 


3000 


0 


.5 


.2 


.065 


.074 


.96 


1500 


1500 


0 


0 


.3 


.093 


.095 


.91 


1000 


3000 


0 


0 


.3 


.093 


.11 


.89 


3000 


3000 


c 


0 


.3 


.OSO 


.081 


.99 


1500 


1500 


0 


.5 


.3 


.097 


.12 


.97 


1000 


3000 


e 


.5 


.3 


.0S4 


.11 


.S9 


3000 


3000 


c 


.5 


.3 


.083 


.09 


1.00 



37 



Table 11: Bias, a, = 0.8, ASVAB, n h m 1, a = 0.05. 



Jp 


Jr 


c 








A 




MH 


SIB 


1500 


1500 


c 


0 


.2 


.026 


.029 


.27 


.42 


.50 


1000 


3000 


0 


0 


.2 


.034 


.039 


.27 


.63 


.79 


3000 


3000 


0 


0 


.2 


.034 


.034 


.27 


.90 


.95 


1500 


1500 


c 


.5 


.2 


.027 


.035 


.27 


.63 


.66 


1000 


3000 


0 


.5 


.2 


.034 


.038 


.27 


.63 


.70 


3000 


3000 


0 


.5 


.2 


.034 


.036 


.27 


.89 


.91 


1500 


1500 


0 


0 


.3 


.051 


.052 


.40 


.85 


.92 


1000 


3000 


c 


0 


.3 


.042 


.044 


.40 


.77 


.84 


3000 


3000 


c 


51 


.3 


.042 


.046 


.40 


.99 


.99 


1500 


1500 


0 


.5 


.3 


.051 


.057 


.40 


.91 


.93 


1000 


3000 


c 


.5 


.3 


.038 


.048 


.40 


.77 


.82 


3000 


3000 


c 


.5 


.3 


.039 


.045 


.40 


.94 


.97 



Table 12: Bias, a n = 0.4, ASVAB, n fc = 3, a = 0.05. 



Jf 


Jr 


c 


df 




flu 


A 


SIB 


1500 


1500 


0 


0 


.2 


.065 


.067 


.70 


1000 


3000 


c 


0 


.2 


.052 


.056 


.53 


3000 


3000 


c 


0 


.2 


.052 


.053 


.85 


1500 


1500 


c 


.5 


.2 


.052 


.068 


.63 


1000 


3000 


0 


.5 


.2 


.064 


.083 


.73 


3000 


3000 


0 


.5 


.2 


.064 


.072 


.92 


1500 


1500 


0 


0 


.3 


.098 


.10 


.94 


1000 


3000 


0 


0 


.3 


.097 


.10 


.97 


3000 


3000 


c 


0 


.3 


.079 


.079 


.98 


1500 


1500 


0 


.5 


.3 


.097 


.011 


.98 


1000 


3000 


c 


.5 


.3 


.076 


.093 


.87 


3000 


3000 


c 


.5 


.3 


.078 


.090 


.99 



0 

ERIC 



33 



DkufbutknLbc 



Dr. Tarry Aaaa 
EduotUooal Ptyobolofl 
210 Edueatfoa Bid* 
Uofcarsltfof lflbc* 
Chtapatgy IL 41101 



Dr. Jaaaai AJgJna 
140) Sonata Hal 
UafcwifyotFMda 
GabaavBa, FL32<05 

Dr. Erfiag & Aodcran 
DtptrttNot of Statlattoa 

SttdtotTMOC 4 

1495 Coptnbagan 
DENMARK 

Dr. RooaM Anattrooi 
Rutfoi UnJvwiiiy 
Graduate School of Manataaeal 
Newark, KJ 07102 

Dr. Evi L Beiar 

UCLA Camar for the Study 

of Equation 
145 Moort Hal 
UnJvcnity of California 
Lot Anfelea, CA 90024 

Dr. Laura L Buna 
Coficfc of Education 
UcJvartlfy of Toledo 
2001 W. Bancroft Stmt 
Toledo, OH 43404 

Dr. waCaa Ki Ban 
UolvertJry of Mtonetota 
Dipt of Educ Psychology 
330 Burton Hal 
ITS Pflbbury Dr.. IE, 

** , MN 55455 



Dr. laaac Bejer 

Law School AdaMone 



P.O. Boa 40 

Newtown. FA 1094W040 
Dr. In Berottafa 

Denensem of Pwcbotocv 
UaWefayofTeaec 
P.O. Boa 19521 
Affinejoo, TX 7401M52I 

Dr. Mcoucba Hranbauai 
S cbod of Education 
Tai Avw University 
Raaat Aviv 49978 
ISRAEL 

Dr. Arthur & Blefeee 
Coot N712 

KavaJ Training tytttss Camar 
Orkodo, FL 32813-7100 

Dr. Bruce Bloajoca 
Dafaoat Manp ower Data Camar 
» Pacific St 
Suite 155A 
Monterey, CA 99943-3231 

CaX Arnold Bobcer 



t .trutarlaavEa S aac tf eca ot ruaa 
auertiar Kooaacjao Aatrid 
Bnajaanaat 

1120 BruaaaK BELGIUM 

Dr. Robot Braatm 
Code 281 

Haval Trainfaf Syeteaai Caotar 
Orlando, PL 32826-3224 



Dr. Roban Bran** * 
Aeaarkea CoDtfa . Ulng 
Profraoi 

p. o. boi ia 

lowa Or/. IA 52243 

Dr. Ortfory Cendd 
CJUMcGrew.HJI 
2500 Oaraao Road 
Mootarvy, CA 93940 

Dr. Jobo aV CarroB 
409 EttoaRi, North 
Chapel m, NC 27514 

Dr. John M. Carrafl 
DM Wataoo Rceeercb Caotar 
Uaar Interface Institute 
P.O. Box 704 

Yorktown Htijbia, NY 10591 

Dr. Roban M Carrol 
Chief of Naval Operations 
OP-01B2 

Wmtiopp* DC 20350 

Dr. Raymond E Carta! 
UES LAMP Science AcMaor 
AFHRL/MOEL 
Brook* AFB. IX 78235 

Mr. Hua Hua Chung 
Uofcartky oflOino* 
Departs** of Sutiatka 

lOiiswHa 

725 South Wright St 
etemO, 41120 



Dr.NonaaaOiff 
Dcpertaeent of Psychologr 
UoW. of So. California 
Lea Anaalcs, CA 90009-1041 

Director, Manpower Profraa 
Camar for Naval Ane^m 
4401 Fore* Avenue 
P.O. Boa 14241 
AJaaodrk VA 223024241 

Director, 
Manpower Support and 



Caotar for Naval Ana^ab 
2000 North Baaurtprd Straat 
Akaanarfc, VA 22311 

Dr. Stanley Cottar 
Office of Naval Tecbnoioff 
Coda 222 

100 N. Quhxy Street 
ArBoftoo* VA 22217.5000 

Dr. Haaa P. Gosebag 
Faodty of Law 
Uatmkyof lioburx 
P.O Baa 414 
Maastricht 

The NETHERLANDS 4200 MD 

Ml Carofra K Crooa 
Johns Hooka* University 
(/■panaam oi t fjxuuiofj 
Charles dfc 34th Straat 
Beftesore. MD 21211 

Dr. Tleaotby Devey 

Aaaarkao CoAcfe Tatting Projrata 

P.a Boa 141 

lowa dry. IA 52243 

Dr. CM Daytoo 
Dapartaaam of Ma a au n aaao t 

Statistics A Evaluation 
CoOtpt of Educatio n 
Urivereky of Maryland 
Cofteae Pari, MD 20742 



Dr. Ralph J. DaAyala 
Maaturcaent, Statistics, 

and EveJuetioo 
Bcojaaala Bid*, Rai 4112 
UnkereHy of Maryland 
CoBtt* Part MD 20742 

Dr. Lou DOaBo 
CERL 

UnKarahy of [Knots 

103 South Maibtwi Avanua 

Urbar* IL 41101 

Dr. Danaraaad Dfvaj 
Camar for Naval AnaV»w 
4401 Ford Avanua 
P.O. Boa 14241 
Akaandrla. VA 223024248 

Mr. HaMO Dong 

Bal C oaa aiunlcatiooi Raaaarah 

Rooa PYA IK207 

P.O. Boi 1320 

Piaaataway, N3 08*55-1320 

Dr. Fria Drat gov/ 
UnKanlty of Illinois 
Deparunant of Pryebotefir 
403 L DaniaJ St. 
Cbaapatin, IL 41820 

Dafanaa Tacfankal 

Informattoo Caotar 
Cbbmtoq Sution, Bid j 5 
Akaandria, VA 22314 
(2Copla») 

Dr. Supbao Dunbar 
224B Undquiat Contar 

for Maaauraaaant 
Unlvaralry of lowa 
lowa Ory. IA 52242 

Dr. Jaoaaa A. Earlaa 

Air Form Hum Raaouroai Lab 

Brooaa AFB, TX 78235 

Dr. Suaao Easbrataoo 
UnKarabyof Kanaaa 
Pryobotoy Dapanaaant 
424Fraaar 
Lawranaa, KS 44045 

Dr. Oaoraa Eo|kfaar< Jr. 
DwWoo of Edueauonal Stud* 
Emory Unkarafay 
210 rabbuma Bidf> 
Atlanta, OA 30322 

ERIC Facflhy-AoaubWona 
2440 Rahairab Blvd. Sulu 550 
RoctvOK MD 20850-3238 

Dr. Baojaaala A. Pabbaok 
OparaUonal Tacbnoioa>aa Corp. 
5825 Caaagbao, Sukt 225 
Sao Antonio, TX 78228 

Dr. Martha! 3. Parr, Consultant 
CoankKa A Inttrvctlonal Soiaooaa 
2520 North Vamon Straat 
Arfiofion, VA 22207 

Dr. P A. Fadarko 
Coda 51 
NPRDC 

Sao DkpA CA 92U24800 

Dr. Laooard Faidt 

for MMmw— i t 
UoW»ky ef lew, 
lew, O9. IA 52242 



9 

ERIC 



39 

BESTC0PlfrlV4IUBLE 



UnkmHy of Iftnok/Stooc 



1*12/90 



Dr. Richard L Fcrgueoo 
AmHcu Coflcge Teating 
F.O. Boi Ml 
Iom Or/, IA 5224} 

Dr. Oerberd FWw 

UoblgpAfC 5/3 

A 1010 Vienna 
AUSI1UA 

r>. Myron FkcbJ 

U A Amy Headquartere 

DAP&MRR 

IfceFontagon 

We*bJnf»a DC 20310-0300 

Fret Donald FtaeereW 
UoKonfcy of New England 
DcoAnBcnt of FfycboJogf 
AnokUk, New South Wale* 23S1 
AUSTRALIA 

Mr. Piu) Foley 

N*y FenonneJ RAD Center 

Sen Ditto, CA 921524800 

Dr. AKrtd R. Fregjy 
AFOSWNU BUg. 410 
Beting AFBk DC 20332-6441 

Dr. Robert D. Gibbon 
Mook State Fiyeblairic Imt 
R& 529W 

1401 W. Ttyter Stmt 
Cbkego, IL 40612 

Dr. Janke GUM 
Unfcenfa of MAteeobueea* 
School of Education 
Aabcrtt, MA 0100) 

Dr. DrewGHoscr 
Educational Teating Service 
FriooMMv NJ 01541 

Dr. Robert Oleear 
Learning Reeca r tb 

A Pi ikwm Center 
Uofcenfcy of Ftaaburgb 
9959 Ol-ta Sum 
Ftaabur* FA 15240 

Dr. SbxrrieOoa 
AFHRIAIOMJ 
Brooke AF* IX 71235*5401 

Dr. Bon Green 

John* HoeeJo* Urfrcciky 

DopOftSOnl Of r^fwdvy 

Cbariee A 34tb Street 
BchJcaora, MP 21211 

tflifwel Haboo 

■^■e^eeeiB^Be • BWW^^W 

DORMER OMBH 
F.O. Boo 1420 
D-7990 Friedrkfaeh*rco 1 
WEST GERMANY 

Frot EoWd HaoficJ 
School of Education 
Stefilbri Unkmky 
SteofbriCA 9430$ 

Dr. Ronald K. He»Wetoo 
Uejvarelej of Mewcb toc fti 
Laboratory of Fiycbocwtrie 
end B»l u e u Vt R caear c h 
K» South, Rooa 152 
Aaborit, MA 01003 

Dr. D*?d Hvr**k 
Unfcereiiy of USnoii 
51 Gerry Drh* 
CbMooIfa 1L 41X20 



Dr. Grant Hennlng 
Senior Reeearcb Sciential 

DMlIOQ of Measurement 

Research and Services 
Educational TesUog Service 
FrtocouxvNJ 0«M1 

Mi Rebecca Hftur 

New FonoofMl RAD Center 

Coot 43 

San Diego, CA 92152-4000 

Dr. Thecaat M» Hireeb 
ACT 

F. a Bat ia 
low* Gty, IA 52243 

Dr. Feui W. Holland 
Educational Testing Scrvkx, 21-T 
Roeedak Rood 
FriaocusaNJ 06541 

Dr. Ftu) Horn 
077 G Street, #104 
Cbula Vku, CA 92010 

Mi. Julia & Hough 
Cambridge Uohwlty Frees 
40 West 20ih Street 
New York, NY 10011 

Dr.WWaaHowoB 
CM Scientist 
AFHR1/CA 

Brook* AFB, IX 71235*5401 

Dr. Lloyd Humphreys 
UntertJryef Ifiinofc 
Department of FeyeboJoff 
403 Beet Denial Sum 
Champeljn, 1L 41620 

Dr. SitvoB Hunka 
3*104 Edue N. 
UijJverslty of Alberta 

EdoOMOO, AJbOftA 

CANADA T6O205 

Dr. Huynfa Huynh 
Ccifltgi of Education 
Ui*. of South Carolina 
Ctrttmkfr, SC 29201 

Dr. Robert Janmronc 
Eke And Computer Eng. Dcpt 
UoKmlqr of South CatoGqa 
Columbia, SC 29201 

Dr. Kuasar Joeg-dtv 
UnKersiry of Winols 
Deceracnt of Statistics 
101 DM Ha 
725 South Wright Street 
Qwnnelgn, JL 41O0 

Dr. DoutfM K Jom 
1200 Woodf ore Court 
Teal Rkor, NJ 0(753 

Dr. Bri»o Jimkor 
CAmogk-MAloa UnNcnHy 

DApAfUAOBt Of StAtktki 

Sobootay FaA 

Ficuburfji FA 15213 

Dr. Mkhod KaaIaq 
Ofte of Botk RoMtrcfa 
UjSL Any RoMArab lafttkuto 
5001 EhAabOAW Amm 
AkMdrk VA 22333-5400 

Dr. Mitoo & Raci 
EureooAO Sdooor CoorfinAuoo 
OTDofl 

UJL Arwy Ramajto ImtkutA 
Bat 45 

FPO New York 09510*1500 



Frot John A. Kaau 
DtpATUBOot of Ftyobok>0 
UolwAtily of NowoAtdi 
RSW.230I 
AUSTRALIA 

Dr. jA*4tuo»a 
DtpATUAAot of Ffyoboto0 

UoKor^ty 
F.O. Boo 522 
MurfTAAtboro, TN 37132 

Mr. Sooo-Hooo Kim 
CorefUtAf*hAtAd EduatOoo 

RoMAftb Ltbontovy 
UnKwJtv of Mndk 
UrhAJM, IL 41101 

Dr. O. Gift IQfipbuiy 

FonteAd FubCcScbooh 

RoMArah And EvbIuiUoo DcpArUDAQt 

501 North Dboo Suaai 

F. O. Box 3107 

ForUaod, OR 97209*3107 

Dr. waflwRoefl 
Box 7244^ Mca*. And Evil Or. 
UnKmhy of Toss*Au»tlo 
Aufttin, TX 71703 

Dr. Richard I Koubtk 
DtpArtsAnt of BtoaAdkJ 

A Human Facton 
139 Eftftnotfins A Miih BWg, 
Wright Suit UnKwaky 
Da>wx OH 45435 

Dr. taonard Rrockar 

Navy ForaonoA) RAD Contor 

Code 42 
Sao DitfO, CA 92152-4000 

Dr. Jarry Uhnua 

Dcfcnaa Manpowor Data Cantor 

Suhc 400 

1400 Wteon BKd 

RoaaVn. VA 22209 

Dr. Those* Leonard 
UnfcoreJiyof Wiaoonaln 
DepAiia ao m of StauaUoi 
1210 Wait Deyion Street 
Madiaoo, WI 53705 

Dr. MkbtoJ Lavtoe 
Eduoatfonal Fiycbotop 
210 Education Bide, 
UnKwaHyof UMnoii 
Chifipi, IL 41001 

Dr. ChArioj Lr*k 
BducailoaaJ Toting SarvkA 
FriocAton, NJ 00541-0001 

Mr. Rodney Urn 
UoWiHyof Wnck 
DeportBeot of Faycfao(o0 
403 & Dental Sc 
ChAaoaSeA IL41S20 

Dr. Robert L Uno 
Cabou* Box 249 

Untveratty of Colorado 
Boulder, CO 003094249 

Cmv^NM AnaVai* 
4401 Ford Avenue 
F.a Box 14241 
Alexandria, V A 22302-0240 

Dr. Fradark M. Lord 
EducaUonal Tearing Service 
Frinoaton, NJ 06541 



40 

BEST COPY AVAILABLE 



Unfcmty of HfookSuxA 



12/12*0 



Dr. Richard Lutcht 


i ■ --*-- 
uwva 


rw ** ** ** — w — 
Dr. Fuowao iaaacpaaa 


ACT 


NaMU 1701 Mar for Appuos Koaearoo 


Dopaitaaant of F^yoholo0 


P. O. oOI lit 


Im * — «» '-' <— . -II'- 

■ Arwojai iniO))i|anoi 


uoMranj oi laonoatM 


JOWB City, IA 5234} 


nwm RaaaaroQ uooraiory 


910B Auatio Paay Badg. 


Coda 5510 


KftOOLMOj TN 37914-0900 


Or* Ooorja B» Maoroady 


WaaatopOQi DC au37»wo 


Mr. Draw Saoda 


DfWtMH 0( H Mill MBit 




ur* riafna r. u^iay, jr. 


nfniX wOM o* 


Cottyt of Education 


Soboef of EiuOMJOO • wrn avl 


SM IMfO, CA 921324000 


UoK-araity el Maryland 


Dapanaaoi oi couoiuonaj 


a kJtN <wvia4 
CoutfC Pair, MU 20742 


Piycbo*o0 a\ Taobooaoy 
Uo^aniry oTSoucbom CMiforek 


uoaoi ocooar 


Fayobotofjcal 4t QuanthatM 


Dr. Gary Mareo 


Lot Aoajaftaa, CA 900W-0Q31 


fouodaUooa 


Stop 31*B 




CoMfi of Education 


Educational ToitJnj Sarvtca 


Dr. JaM B> OnoQ 


UnfraraJty of loav 


Piiooaton, NJ 0M51 


WrCAT AfMOM 


lowa Oty, IA 32242 




1175 South Suit Stroat 


Dr. Qaaaao J, Mama 


fm tit* rLA/vca 
Oram, UT 04Q>f 


•X_ AJ »* | - 

Dr. Mary Scbratz 


Office of Cbiaf of Naval 




dlOQPaftaida 


oporauonf (or 13 rj 


VwOf Of Mfw KaaaoraBi 


vinm4\ ca yj*mm 


Navy ASOCXi ROOD) 2*32 


COM 1142CS 




Ml » • — fv^ MICA 

WaawOgLon. DC 203X1 


ouu i% wiMncy straai 


ur. Dan stpa 


Arl0fL00| VA ZZZ17*3wv 


Ntvy Paroonnd Ra\D Caotar 


Dr. Jaaaa R. McBnda 


(0 CopiOl) 


San DitfA CA 92132 


HlibRRO 


4430 EZabunt Drfca 


Dr. Judith Oraaami 


Dr. Robus soaay 


Cab ^.V*a /*A MlM 

5ao DttfA CA 92120 


Baak Raaoarob OfHot 


1 * * - trti I*** t- 

uowtfvty oi innoif 




Amy Roiaarth lmuuna 


DeparuDtnt of Stauatka 


Dr. Cbranot C McCormk* 


5001 Efctnbowar Avcnut 


101 mm Hal 


nWi US M trvDM/M ErVJT 


AMaooona* VA 22333 


725 South Wrijbt Sc 


2500 Groan Bay Road 




- . If _f | m ~LTL 

Coanpttjfa IL 61620 


North Chicago, IL 60064 


Dr. Jaaat Oiianaky 




Inaihuif for Dofanac Ana^aaa 


Dr. Kaxuo Sbliesaaui 


Mr* Cbnatopbar MeCuikcr 


1001 ft. ooaurtfiirB SL 


7*9*24 Ku|cnuau*Kalpn 


Uobaralty of UUooia 


AWBodna, VA 22311 


Fiijkawt 251 


Dcpartaacnt of Piycbofo0 




JAPAN 


403 E DaniaJ St 


Dr. rotar 3. raaowy 




Caitapaigfl. n# a 1130 


Eduoaiiooal Taa*Jo| Sanaea 


Dr. Randal Souaaakar 




Naval Raaoarch Laboratory 


Dr. Rootrt m clunky 


rTlOOaUXV rw OoMl 


Coda 5510 


Educational Totting SoMca 
PriooMoo. KJ 01541 




4355 D^anooK Avanua* s»w. 


. 

Wayoa M. rationoa 


WaanwfUwi, DC 20375-5000 




AboHoio Couool on Eo^Madoo 


Mr. ajhj mom 


Ow Taatiog Sanaaai Sum 29 


Dr. Rkbord E Snow 


C/O Uf» MKOMt 14VVM 


Om Dupoot CHa. NW 


* » -# BA — 

•pCDOOI 01 bOwCaUOQ 


m m — »__ ■ ■» i 

educational r*ycoo*o0 


WoibJo0OOi DC 2003d 


StanTori uonMaajy 


210 cbucouoo DtQp 




oianrofOi ca ihjw 


uwwwy oi iwoob 


a0p o^4)oStt o^M^o^-fe-OQ 




rriampajft u* aiBQl 


Dopaftaaaot of Fayafaolo0 


Dr. Rjcfiard C aomwan 


Portkad Suu Unkwiity 


Navy Paraonoo) Ra\D Cantar 


ur. iMDOtoy Miflar 


r.U* BOB 791 


f u f>' OA »M*A*J ^IM 

500 DMfAj CA fZ132 6600 


A/*l» 

ACT 


f nn i,-j ad t/ntA 
rOftMOO, oil 972D7 


P. 0. Bos 168 




Dr. Judy Spray 


* - ■"Ian 1 A IMH 

KWt CKy, IA 52243 


Dope of AooanattrauVi Srianoaa 


ACT 


COM 34 


P.O. Bos 166 


Dr. Rooan Mm toy 


Htvol Foatpiduata School 


Iowa Ofy» IA 52243 


Educational Totting Sarvico 


Mootaray, ca B 3ri>3wBa 


ft ,n,i vi Mlit 

rnooatorx ru ovai 


ff"a_ ff, - 1,.- - 

Dr. Martha IWoftrng 




Dr. Mara D. Raoiaaa 


Educational Taating Samoa 


ur. wiMaB Montague 




rrtneaton, ru 01341 


NrRDC COOt 13 


• A Bjm «Afl 

r. o. Doa ifm 




sao UMfo. ca 92132 taw 


lowa cny. ia 32243 


Dr. Pator >wo 

- - - At— — 1 A L _ I— 

contar for ft aval Anayaai 




TV Milrnkn Rm 


4401 Ford A vaoua 


Naw Paraonno) RJtD Cantor 

WON mi 


AFHRL/MOA 


P.O. Boa 16266 


iMMi. a EH TV Trtnt 


AaaaanortOj va 7CTw4ffao 










IW. JUB ABBB 


Dr. woaaao aioui 


Haaaouaitora Martoa Corpa 


now CBoii ruH 


I till in tfii ««# tMttJiU 

um*«wty oi iitnoai 


ComMK-20 


UokVarahy of Mlnoaaou 


Dtpanaaont of Suuatka 


WaablnjlOQ, DC 20360 


75ERtvarRoad 


101 tU Hal 




Maaaapoaa, US 554554344 


725 South Wrijh* Sc 


Dr. Rotaa Nanoakuaaar 




CbaaapafaA IL 61620 


Ba\wadooaJ StttdSaa 


Dr. Cad Roaa 




WMard Hafl, Roooi 213B 


CNET-MXD 


Dr. Harborao Saaaateathao 


Uo r ^ni9 of Dctawara 


BuAdbi 90 


Laboratory of PB/ehoontrk and 


Nmri DE 19716 


Groat Ukaa KIC IL 60061 


Bvafaation Roaaarah 
School of Edueadoo 


Ubrary. NfRDC 


Dr. 1 Ryao 


Untvanify of Maaaacbuaatti 


CodtP202L 


DopanoMOt of Education 


Aabarat. MA 01003 


Sao Ditto, CA 921524600 


UntW-arwy of South Carofioa 




Coajaaok SC 29306 


Kf f. Bnd SwiBnfi 

Navy Paraoooo) Ra\D Cantor 

CooV42 

Son Dk|0» CA 92152-6600 



BEST COPY AVAILABLE 

41 



UnKtnlty of IBtooaVSiout 



\inV90 



Dr. John Tmm 
AFOSR/NL So* 410 
Booing AFB» DC 30332-4441 

Dr. Kkuai Tauuoka 
Educational Teulng Srvko 
Md&opOVr 
rMooatoo, NJ 01541 

Dr. Maurice Tiuuoka 
Edixatkxu] Taring Service 
MUStopO-T 
Frincotoa KJ KM1 

Dr. David Thkoco 
DfptitBcm of Pfycbotofir 
UnHwihy of Kenaaa 
Uwrcneo, KS 66044 

Mr. Thorns X TbociM 
Jobm Hcyfclm UtsH-enity 
DtpuuMM of Piycbo*O0 
Cbirta A Mb Street 
Batfaore, MD 2121$ 

Mr. Otry Tbomwon 
UoKwvty of VSlnok 
Educational Fqrchotofir 
Chaspelgn, IL 41820 

Dr. Robert Tkuukawi 
UnHwlty of Mittouri 
Department of Stotialke 
222 Math, Scfeneoa BMg, 
Columbia, MO 45211 

Dr. Ladyard Tudor 
UofcenJty of IQtnoia 
Dcpenncnt of Piycbo(o0 
ott E Daniel Street 
ChMprim.lL 41830 

Dr. David Vote 
Am aw re ant SyiUaft Corp. 
2233 Uofconty Avenue 
Suit* 440 

Sl Faut, MN 55114 

Dr. Frank L Vidno 

Nivy Fanonnel R&D Center 

Sen Diego, CA 12132-6000 

Dr. Howard Wainer 
EducetiooeJ Teating Service 
Frinceton, NJ 08541 

Dr. Michael T. Wtlor 
Unfcoraky of Vya*oneb*MB»aukoe 
Educational F iychoiop Department 
Boo 413 

Mfeouiot, Wl 53201 

Dr. Mteg-Mei Wang 
Educational Taeting Service 
Mai Stop 03-T 
Mocowa NJ 00541 

Dr. Tboam K Warn 
FAA A*detrjy AAC934D 
F.Q. Box 25082 
Ockbooa Cky, OK 73125 

Dr. Brteo Wetere 

HuaRRO 

1100 & WaaWngtoo 

Alexandria, VA 22314 

Dr. Do* J. Woho 
HUO EKott Hal 
U n fcoraoy of Miaooaota 
75&R*erReed 
Mfaoaepotie, MN 55455-0044 

Dr. Ronald A Wakxmen 
Boo 144 

CaroMCA 93921 



Major Mo Wateh 
AFHRUMOAN 

Brooka AFBi TX 71223 / 

Dr. Douftea Wettel * 
Coda 51 

Navy Feraonnel R*D Center 
Son Diego, CA 021524100 

Dr. Rand R. W3e« * 
UoKonhy of Southern 

Catfornte 
Dcparuocot of Fiycbototy 
Lot Angetea, CA 90009-1061 

German Mttury Repraaenuuve 
ATTN: Wolfgang WMgrube 

ScreHkreeftoan* 

D-5300 Bonn2 
4000 Brandy*** Street, NW 
Washington. DC 20014 

Dt Bruot WSKaaa 
Daparmaot of Educational 

rMCbolOfJf 

UnKorWty of IWnote 
Urban*, IL 01801 

Dr. HMa Wing 

Federal Aviation Administration 
800 Independence Ave, SW 
WMbwgtoo.DC 20591 

Mr. John K Wolfe 

Navy Feraonnel R4D Comer 

Son Ditto, CA 92152-4000 

Dr. Oeora* Wong 
Bioctttiattea Laboratory 
Meeortel Steen-JCettaring 

1273 York Avenue 
New York, NY 10021 

Dr. Walaoe WioYoet m 
Na%y Fcreoooo) RAD Center 
Code 51 

Son Diego, CA 021524800 

Dr. Kaouro Yaaaaoto 

OfcT 

Educational Tearing, Sarvtea 
Reaodate Road 
FTtoeetoo, NJ 00541 

Dr. Weody Yon 
CIoVMotHv* Hi 
Del Monte Reaaarob Fart 
Mootarey, CA 93940 

Dc, loaaofc L Young 
Nattoml Seteoaa Foundation 
Rooci 320 
000 0 Street, N.W. 
Washington, DC 20550 

Mr. Anthony R. Zara 
National Counel of Stau 
Boaroi of Nursing Inc. 
423 North Michigan Avenue 
Suka 1544 
CbkagA 0- 40411 



0 

ERIC 



42 

BEST COPY AVAILABLE 



